Skip to content

Add ignore_args support for tool trajectory evaluation#6079

Open
allenjoshua16 wants to merge 4 commits into
google:mainfrom
allenjoshua16:feature/4794-ignore-tool-args
Open

Add ignore_args support for tool trajectory evaluation#6079
allenjoshua16 wants to merge 4 commits into
google:mainfrom
allenjoshua16:feature/4794-ignore-tool-args

Conversation

@allenjoshua16

@allenjoshua16 allenjoshua16 commented Jun 11, 2026

Copy link
Copy Markdown

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

2. Or, if no issue exists, describe the change:

Problem:

Tool trajectory evaluation currently compares both tool names and argument values when matching expected and actual tool calls.

This can cause valid evaluations to fail when tool arguments contain dynamic or non-deterministic values such as timestamps, generated queries, or other runtime-generated content.

Solution:

Added an ignore_args option to ToolTrajectoryCriterion.

When enabled, tool trajectory matching compares only tool names and ignores argument values. This behavior is supported across all existing match modes:

  • EXACT
  • IN_ORDER
  • ANY_ORDER

Updated TrajectoryEvaluator to pass this configuration into the matching logic while preserving the existing behavior by default (ignore_args=False).

Added unit tests covering:

  • ignore_args configuration loading
  • matching behavior with different tool arguments
  • existing matching behavior remaining unchanged

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

Tested with:

pytest tests/unittests/evaluation/test_trajectory_evaluator.py -q

@rohityan rohityan self-assigned this Jun 11, 2026
@rohityan rohityan added the eval [Component] This issue is related to evaluation label Jun 11, 2026
@rohityan

Copy link
Copy Markdown
Collaborator

Hi @allenjoshua16 , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Can you please fix the failing mypy-diff tests before we can proceed with the review.

@rohityan rohityan added the request clarification [Status] The maintainer need clarification or more information from the author label Jun 11, 2026
@allenjoshua16

Copy link
Copy Markdown
Author

Thanks for the feedback. I've addressed the mypy-diff failure and pushed an additional commit to the PR.

The issue was caused by mypy reporting a no-any-return error in trajectory_evaluator.py. I updated the implementation to return an explicit bool value and verified that the trajectory evaluator tests still pass locally:

pytest tests/unittests/evaluation/test_trajectory_evaluator.py -q
# 35 passed

Please let me know if there are any other issues I should address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval [Component] This issue is related to evaluation request clarification [Status] The maintainer need clarification or more information from the author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ignore_args option to tool trajectory evaluation

2 participants